IBIS Macromodel Task Group

Meeting date: 03 September 2019

Members (asterisk for those attending):
ANSYS:                        Dan Dvorscak
                            * Curtis Clark
Cadence Design Systems:     * Ambrish Varma
                              Ken Willis
                              Kumar Keshavan
Intel:                        Michael Mirmak
Keysight Technologies:      * Fangyi Rao
                            * Radek Biernacki
                              Ming Yan
                              Stephen Slater
                              Maziar Farahmand
Mentor, A Siemens Business: * Arpad Muranyi
Micron Technology:          * Randy Wolff
                            * Justin Butterfield
SiSoft (Mathworks):         * Walter Katz
                              Mike LaBonte
SPISim:                     * Wei-hsing Huang
Teraspeed Labs:             * Bob Ross

The meeting was led by Arpad Muranyi.  Curtis Clark took the minutes.

--------------------------------------------------------------------------------
Opens:

- Arpad noted that he had added two new items to the agenda.
  - 6) Enabling Backchannel Interface in Statistical Mode
  - 7) BIRD198.1 draft review.
  

-------------
Review of ARs:

- Ambrish to send out a new draft of BIRD197.5 incorporating changes made during
  the meeting.
  - Done.
  
--------------------------
Call for patent disclosure:

- None.

-------------------------
Review of Meeting Minutes:

Arpad asked for any comments or corrections to the minutes of the August 27
meeting.

Ambrish noted one comment from Walter that hadn't been captured in the minutes.
The group had decided to remove this sentence from the previous draft:
  If DC_Offset is Usage In, the EDA tool may use the input value of DC_Offset to
  post process the data returned by the AMI model.
During the discussion surrounding this sentence, Ambrish recalled that Walter
had said, "we [EDA vendors] all agree... [on what to do with DC_Offset as an
In]".  Walter said he thought he'd said something like, "We [EDA vendors] all
agree on what we should be doing with DC_Offset as an input and as an output."
Walter noted that he thinks we all do agree.  Ambrish agreed, but he noted that
he had only been willing to remove that sentence because he believed there was
general consensus that it was clear to everyone that if DC_Offset were Usage In
the tool could add the input value to the Rx GetWave() output waveform if it
chose to do so.  Ambrish said with this comment recorded here he was happy with
the previous minutes.

Ambrish moved to approve the minutes.  Walter seconded the motion.
There were no objections.

-------------
New Discussion:

BIRD198.1:
As he had proposed doing at the previous meeting, Arpad presented a summary of
the new draft he, Bob, Randy, and Mike L. had received from the authors.  He
noted that it was a well written and thorough BIRD.  It proposed a few keywords
to describe an RC circuit between power and ground rails on the die.  The new
version incorporated some language to allow for the coexistence with IBIS 7.0
(BIRD189) interconnect model syntax.  It provided a truth table defining various
combinations with BIRD189 that were illegal conflicts, combinations that were
syntactically legal but could give questionable results if not done properly,
and combinations that were straight-forward and had no problems.

Arpad noted that he understood that the goal of BIRD198.1 was simplicity, but he
expressed concern about adding a new way to do something that could be handled
with BIRD189 syntax.

Walter noted that he had also reviewed the new draft.  He said he thought they
wanted to do something very simple, but it had become overly complicated.  He
said he thought they really wanted to define an on-die circuit for every rail
node.  He thought it might be handled more easily by creating a new on-die
decap keyword, instead of a new model type.  Walter said he thought the simple
circuit topology in BIRD198.1 might be useful to have for power modeling, and
BIRD189 could be used for signal interconnect modeling.  Walter thought the
simple circuit concept was useful, but the proposed syntax was too confusing.
He said he didn't understand the [Model Selector]s in the proposal.

Walter said he would draft a simpler proposal using the two resistors and a
capacitor, but strictly between Pin signal_names.  Bob noted that the current
proposal uses bus_labels too.  Walter said he thought the authors included
bus_labels because BIRD189 did, and he thought it was making the proposal more
complicated than was necessary to accomplish their goal.

Arpad noted one other potential logistical issue.  The proposal introduced a
new Model_type, but it may not have addressed all the associated issues, for
example, Sub-Params like C_comp that are currently required for all [Model]s.
Bob noted that existing series model types were already different than other
model types [the spec states that C_comp is ignored for series models].  Arpad
said this was the type of explanation that might be needed for this proposal
too.  Walter said he would propose another option instead.

Walter took an AR to draft an alternate proposal that he thought would meet the
authors' requirements with a simpler syntax.  Bob said not to rule out
bus_labels arbitrarily.  He said that might be the best solution, and that's
how it's tied in with the I/O buffers.  Walter agreed and suggested we review
the options once he produces his alternate proposal.

Enabling Back-channel in Statistical Mode:
Walter shared a presentation introducing the topic.  Walter noted that rather
than draft a BIRD and bring it for review, he wanted to introduce the topic and
issues and then have brainstorming sessions to resolve the questions and build
the solution together.
- Justification (slide 3)
  - Requested by IC vendors and users.
  - Personal experience developing optimization algorithms for DDR5.
  - Desire to move the optimization algorithms into Tx and/or Rx models in an
    IBIS compliant way.
  - Enable all EDA companies to support BCI optimization in Statistical and/or
    Time Domain.
    
- How to do it?  (slides 4-6)
  - Will require a new function similar to AMI_Init().
  - Function name is yet to be decided.
  - Function won't allocate memory, it will use the handle allocated by Init().
  - Protocol will determine the information transferred between Tx and Rx .dlls
    (and any repeaters in the middle).
  - New Reserved parameter "BCI_Training_Type" to indicate if BCI Statistical
    mode is supported.  (e.g., "GetWave", "Both", "Statistical(name TBD)")
  - BCI_parameters_in and BCI_parameters_out as new function arguments.
  - Like Init(), the new function takes an IR in and returns an IR.
  
- Communication between the .dlls. (slide 7)
  - Could use file I/O as in BIRD147.  EDA tool would stay out of the BCI
    communication path itself and just call the new function(s) in the Tx and
    Rx iteratively.
  - Could use strings generated by the models and passed back and forth via the
    BCI_parameters_in and BCI_parameters_out arguments.
  - Other suggestions?

- Statistical Training Flow (slide 8)
  - EDA tool will alternately call this new function in the Tx and Rx if:
    - Training is On
    - BCI Statistical is allowed and enabled on the Tx and Rx.
  - Training stops when "converged" (or a failure occurs).
  - Results from last call to the function should be used for statistical
    analysis.
    
Ambrish asked how many back-and-forth iterations one might expect during the
training phase in Statistical flow.  Walter said he'd had a good bit of fun
working on a DDR5 DQ write example using the same channel step response from
his DesignCon presentations.  So, with a real channel and hopefully real DDR5
buffer models, some of the optimization algorithms had taken 700 iterations
to converge, and he'd gotten that down to 250, perhaps down to 70 with a good
initial guess.  So, he suggested on the order of several hundred to 1000
iterations, and said he could imagine convergence taking longer for more
complicated channels.

Ambrish noted that unlike the GetWave() bit-by-bit flow where behavior changes
over multiple GetWave() blocks, here we only had a single IR in Statistical
training.  Couldn't the IR be analyzed in one or two iterations?  Walter noted
that if compute power were unlimited, in his DDR5 example one could take the
3 FFE taps, and the 4 DFE taps, each of which had 30 settings, and ignoring
gain or other parameters still have (30)^7 combinations.  But in practice 
you can't just use brute force and run every combination.  So we need a way to
start and head toward the optimum solution.  Gradient search and many other
algorithms could be used to avoid trying all the combinations.  Try some
settings, convert the modified IR to an eye or COM or some other metric and
optimize based on that.

Ambrish asked if all this would happen in the Rx, and noted that we can't
legislate what the Rx needs to do to determine its optimal set-up.  Walter noted
that the DDR5 Rx model might be stupid.  No DFE optimization, or CDR, it just
expects to be told the tap weights and have the skew set.  In DDR5 DQ Write
protocol, it's the DDR5 controller (Tx) that is controlling the Rx.  That's the
way the real hardware works, and the memory model maker is going to want the
controller vendor to write the optimization algorithms into the Tx model.
Wei-hsing asked if this optimization could be done in the EDA tool rather than
in the model.  Walter said that the optimization algorithms that are developed
will go into the Tx.  We are enabling this flow to happen if we have the EDA
tool alternately calling the new function in the Tx and Rx and letting them
negotiate.  Wei-hsing said the EDA tool wouldn't need to do the full sweep of
all the combinations, and the EDA tool could control the optimization flow and
call the new function with different settings and follow its own optimization
path.  Ambrish said the lesson from BIRD147 was that it would be hard to define
that flow for the EDA tool in the spec, and it's easier to have the EDA tool
be the conduit that allows the AMI Tx and Rx to communicate.  Walter noted that
EDA tools providing the optimization flow had been the only option.  But what
customers wanted was the optimization to be done by the appropriate model .dll
and in a flow compliant with the IBIS standard.

- How to proceed (slide 9)
  - Brainstorming session to:
    - Determine the name of the new function
    - Determine the communication mechanism
    - Determine new Reserved Parameters.
    - Develop a DDR5 DQ Write BCI Protocol
      - Invite additional memory and controller vendors.
    - Develop a Generic Tx N-tap FFE BCI Protocol.

Walter noted that in real hardware the controller has a training process to
determine the right value of VREFDQ to set in the buffer.  Ideally it would be
that same as the DC Offset, but in fact the DC_Offset has nothing to do with
the register that sets VREFDQ.  DC_Offset has to do with the step response at
the input to the buffer, it's related to the single-ended waveform coming in
to the receiver.  But the Rx may not be set (register setting) to that voltage,
and model makers might like to return the actual value of that reference
voltage.  Walter suggested that this might be a well-defined value to return as
the output value of DC_Offset, so we might consider this in the DC_Offset
BIRD.  This was why he'd asked to table the DC_Offset BIRD discussion until
we discussed this topic.

Fangyi noted that individual Tx models would be training individual Rx models
in this proposal.  He said in the real world, multiple Txs and multiple Rxs
might share the same settings, for example per nibble or per byte settings
of the Txs and Rxs.  He said this might be a limitation of this proposal if
we can't force different Txs and Rxs to share the same settings.  Walter said
it was a possibility that you might want to train on one Tx/Rx pair and apply
it to others.  He said that thus far customers were asking for training of
individual channels, but Fangyi had posed an interesting question.

Fangyi said real systems were training per nibble or per byte (4DQ or 8DQ
sharing the same settings).  Walter asked if this was a requirement or just
something that had been done.  Justin said DRAM had the capability to set DFE
taps per DQ, so that possibility should be covered.  Walter said it might be
likely that all the routing for a nibble would be identical, and you'd end up
with the same outcome if you optimized them individually or as a group.  Fangyi
agreed this was possible.

Fangyi said VREFDQ might be somewhat different, and that it was also shared by
a nibble or byte lane.  A particular register might be set that applied to an
entire nibble.  Walter agreed this was a possibility.  Randy said he wasn't sure
how much we could say since the spec hasn't been published, but perhaps it's a
possibility there's per DQ VREFDQ adjustment as well.  Walter said as a
practical matter all the routing for a nibble will be almost identical, the
models will be the same, the DC_Offset will be the same.  Hardware may train
them individually, or all 4 at once.  If they're all the same, you only have to
train on one.  If there are differences in length, etc., then it becomes an
interesting problem if you only have one set of settings for all 4.  Fangyi
noted that in practice the physical or electrical lengths can differ even within
a byte lane.  The timing skew can be calibrated per bit, which is an indication
of different electrical lengths for bits within a byte lane.  Walter agreed.

Fangyi asked if this training is limited to using strictly statistical methods.
Walter said it takes an IR in, and it returns an IR, but nothing would prevent
the model from generating a time domain waveform to use during training.  Fangyi
agreed.  Walter said the time domain waveform could be used to handle
non-linearities, and even to prepare for future GetWave() simulation, but only
the linearized version could be returned in the final IR.

- Walter: Motion to adjourn.
- Curtis: Second.
- Arpad: Thank you all for joining.

AR: Walter to draft his alternate proposal for BIRD198.1.

-------------
Next meeting: 10 September 2019 12:00pm PT
-------------

IBIS Interconnect SPICE Wish List:

1) Simulator directives